R Markdown

1.Large fire predictor

Through EDA, we found that the fire size class in California has a very high frequency in A and B fire class size. Those sizes of wildfires may not be a threats to environment and properties since they could disappear soon. The size of C class and above could be dangerous as the burned size grows fast.

Preparing the data

Data are labeled with large fire or not from 1992 to 2015, each data have 3 features as daily temperature soil moisture and rainfall in average of California.

## 'data.frame':    23376 obs. of  5 variables:
##  $ time               : chr  "1950-01-01 00:00:00+00:00" "1950-01-02 00:00:00+00:00" "1950-01-03 00:00:00+00:00" "1950-01-04 00:00:00+00:00" ...
##  $ tair_day_livneh_vic: num  4.14 1.77 -3.09 -3.59 -2.75 ...
##  $ month              : chr  "01" "01" "01" "01" ...
##  $ year               : chr  "1950" "1950" "1950" "1950" ...
##  $ DOY                : chr  "001" "002" "003" "004" ...
## 'data.frame':    23376 obs. of  6 variables:
##  $ year                     : int  1950 1950 1950 1950 1950 1950 1950 1950 1950 1950 ...
##  $ DOY                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ tair_day_livneh_vic      : num  4.14 1.77 -3.09 -3.59 -2.75 ...
##  $ month                    : chr  "01" "01" "01" "01" ...
##  $ soilmoist1_day_livneh_vic: num  18.3 18.5 18.3 18.3 18.1 ...
##  $ rainfall_day_livneh_vic  : num  0.67193 0.79325 0.07883 0.08991 0.00174 ...
## 'data.frame':    8036 obs. of  8 variables:
##  $ year                     : int  1992 1992 1992 1992 1992 1992 1992 1992 1992 1992 ...
##  $ DOY                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ tair_day_livneh_vic      : num  3.78 4.17 4.19 4.87 4.94 ...
##  $ month                    : chr  "01" "01" "01" "01" ...
##  $ soilmoist1_day_livneh_vic: num  20.2 20.5 21.4 23.6 25.1 ...
##  $ rainfall_day_livneh_vic  : num  0.0616 1.1337 3.0754 7.0797 10.8035 ...
##  $ n                        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ fire                     : num  0 0 0 0 0 0 0 0 0 0 ...
## 'data.frame':    8036 obs. of  8 variables:
##  $ year                     : num  1992 1992 1992 1992 1992 ...
##  $ DOY                      : num  2 3 4 5 6 7 8 9 10 11 ...
##  $ tair_day_livneh_vic      : num  3.78 4.17 4.19 4.87 4.94 ...
##  $ month                    : chr  "01" "01" "01" "01" ...
##  $ soilmoist1_day_livneh_vic: num  20.2 20.5 21.4 23.6 25.1 ...
##  $ rainfall_day_livneh_vic  : num  0.0616 1.1337 3.0754 7.0797 10.8035 ...
##  $ n                        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ fire                     : num  0 0 0 0 0 0 0 0 0 0 ...

Large fire prediction model, by logit regression

Logit Regression : fire ~ tair_day_livneh_vic + soilmoist1_day_livneh_vic
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.0691 0.3117 9.8458 0
tair_day_livneh_vic 0.1461 0.0080 18.1739 0
soilmoist1_day_livneh_vic -0.3157 0.0145 -21.7253 0
## Area under the curve: 0.9017
Confusion matrix for large fire predictor
Predicted 0 Predicted 1 Total
Actual 0 3265 659 3924
Actual 1 692 3420 4112
Total 3957 4079 8036

To try to increase usability for the model, we try to make a model to predict the probability in next day:

Predicting Probability : fire ~ tair_day_livneh_vic + soilmoist1_day_livneh_vic + soilmoist1_day_livneh_vic +
Predicting Probability : rainfall_day_livneh_vic
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.4707 0.3245 7.6148 0
tair_day_livneh_vic 0.1508 0.0081 18.6813 0
soilmoist1_day_livneh_vic -0.2743 0.0158 -17.3927 0
rainfall_day_livneh_vic -0.1312 0.0256 -5.1318 0
## Area under the curve: 0.9002
Confusion matrix for large fire predictor
Predicted 0 Predicted 1 Total
Actual 0 3244 680 3924
Actual 1 689 3423 4112
Total 3933 4103 8036
Logit Regression : fire ~ tair_day_livneh_vic
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.7380 0.0821 -45.5091 0
tair_day_livneh_vic 0.2833 0.0058 48.6031 0
## Area under the curve: 0.8799
Confusion matrix for large fire predictor
Predicted 0 Predicted 1 Total
Actual 0 3222 702 3924
Actual 1 780 3332 4112
Total 4002 4034 8036

The results suppose that with temperature rainfall and soil moisture data from today; We have AUC 0.9 for predicting the the large fire in next day If we try to make a convenient model that require only temperature to predict the probability, the AUC is 0.88.

Multinomial Regression Models

With our training data, we achieve an accuracy of 50.9% for predicting the fire size class of a wildfire based on that month’s condition. From our confusion matrix, it seems that our model is biased towards predicting that a fire is of Class A, which makes sense because smaller fires are much more frequent than larger fires.

## # weights:  35 (24 variable)
## initial  value 239477.324313 
## iter  10 value 125587.165704
## iter  20 value 124592.703002
## iter  30 value 120564.282119
## iter  40 value 120548.543103
## final  value 120548.467144 
## converged
## Call:
## multinom(formula = FIRE_SIZE_CLASS ~ tair_day_livneh_vic + soilmoist1_day_livneh_vic + 
##     rainfall_day_livneh_vic, data = train)
## 
## Coefficients:
##   (Intercept) tair_day_livneh_vic soilmoist1_day_livneh_vic
## B   0.3101005        -0.007419344              -0.026035411
## C  -1.9065189         0.019356089              -0.053193118
## D  -3.3533836         0.022336804              -0.065310667
## E  -5.3151846         0.066847961              -0.027862226
## F  -6.6511606         0.094225716              -0.002200008
## G  -5.3309192         0.079879591              -0.137927206
##   rainfall_day_livneh_vic
## B             -0.10437630
## C             -0.11323666
## D             -0.08439748
## E             -0.05374799
## F             -0.10362338
## G             -0.48551227
## 
## Std. Errors:
##   (Intercept) tair_day_livneh_vic soilmoist1_day_livneh_vic
## B  0.08114983         0.001708254               0.004249007
## C  0.18724290         0.003912611               0.010013040
## D  0.39259298         0.008180648               0.021150644
## E  0.55173621         0.011651716               0.029485386
## F  0.71839662         0.015299765               0.038257280
## G  1.08148894         0.022456543               0.061673134
##   rainfall_day_livneh_vic
## B             0.007735755
## C             0.020840469
## D             0.042846014
## E             0.055929710
## F             0.079680567
## G             0.192757749
## 
## Residual Deviance: 241096.9 
## AIC: 241144.9
## [1] 50.95

Now, for our testing data. We achieve a similar accuracy of 50.8%

## [1] 50.84

Now, we try to predict the cause of the wildfire based on the conditions of that month

## # weights:  78 (60 variable)
## initial  value 450638.517563 
## iter  10 value 351990.198964
## iter  20 value 350236.347991
## iter  30 value 349300.527741
## iter  40 value 348573.289051
## iter  50 value 347430.734728
## iter  60 value 339582.375984
## iter  70 value 338693.455512
## iter  80 value 338687.765856
## final  value 338687.730830 
## converged
## Call:
## multinom(formula = STAT_CAUSE_CODE ~ month + tair_day_livneh_vic + 
##     soilmoist1_day_livneh_vic + rainfall_day_livneh_vic, data = data_wildfire)
## 
## Coefficients:
##    (Intercept)      month tair_day_livneh_vic soilmoist1_day_livneh_vic
## 2    8.1322214 -0.1577226          -0.2580958               -0.07582529
## 3    6.7364952 -0.1506793          -0.2671677               -0.10486743
## 4    7.0614347 -0.1431805          -0.2808147               -0.08389730
## 5    7.0942085 -0.1841204          -0.3386374                0.03907005
## 6    5.0782447 -0.1815844          -0.2515738               -0.14392114
## 7    7.4709575 -0.1539434          -0.2673771               -0.06961042
## 8    6.2183748 -0.1596796          -0.2814460               -0.02678882
## 9    8.8417649 -0.1542511          -0.2826069               -0.07942694
## 10   0.5667165 -0.2508201          -0.1473874               -0.01235681
## 11   2.9853740 -0.1429962          -0.2551327                0.03213029
## 12   3.4646714 -0.1985265          -0.3108203               -0.07492545
## 13   4.8432777 -0.1762985          -0.2525367                0.07264006
##    rainfall_day_livneh_vic
## 2               -0.4584508
## 3               -0.5247959
## 4               -0.3523235
## 5               -0.5507486
## 6               -0.5633070
## 7               -0.4611825
## 8               -0.5196944
## 9               -0.4275431
## 10              -0.8616573
## 11              -0.3378871
## 12              -0.5633781
## 13              -0.5864476
## 
## Std. Errors:
##    (Intercept)       month tair_day_livneh_vic soilmoist1_day_livneh_vic
## 2    0.1750034 0.006421453         0.003238870               0.008398485
## 3    0.2784172 0.009429017         0.004994638               0.013341810
## 4    0.2314689 0.007877799         0.004249215               0.011052364
## 5    0.1985661 0.006814905         0.003768922               0.009445766
## 6    0.7188434 0.023927300         0.012440760               0.034873084
## 7    0.1964082 0.006987880         0.003617004               0.009397018
## 8    0.2483275 0.008321274         0.004586986               0.011780972
## 9    0.1697367 0.006247095         0.003158474               0.008140895
## 10   1.4665606 0.054808849         0.026711216               0.069498223
## 11   0.4875786 0.014975748         0.009213650               0.022605061
## 12   1.5695856 0.047477926         0.028406177               0.075303662
## 13   0.2225888 0.007652354         0.004187189               0.010472280
##    rainfall_day_livneh_vic
## 2               0.01212778
## 3               0.02501050
## 4               0.01548083
## 5               0.01385956
## 6               0.07684880
## 7               0.01429787
## 8               0.01979849
## 9               0.01112386
## 10              0.19991617
## 11              0.02912340
## 12              0.15234282
## 13              0.01731968
## 
## Residual Deviance: 677375.5 
## AIC: 677495.5

With our training data, we achieve an accuracy of 30%. Looking at our confusion matrix for this model, is is a bit less biased towards predicting a certain category. This makes sense because there does not seem to be one predominant cause of the wildfires in our dataset.

## [1] 30.12

Now, for the test data. We achieve a similar accuracy of 30%

## [1] 29.85

2. regression for long term

We also wanted to evaluate the long term effect on the wildfires by nature factors. We tried to predict the cases of wildfires, average fire size and total burned fire by nature factors.

2.1 regression for case,firesize, total burning area

From the corrplot, we can find that the number of case have strong correlation with environment variable. And the fire size, total area burned have some correlation with environment variable. We try to use linear model for those predicted variable, the result suppose that the model is not fit well. Then, we take log to those predicted variable. Then the models are improved.

Linear Regression : log(n) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.8261 0.2806 31.4571 0
tair_day_livneh_vic 0.0526 0.0068 7.7415 0
soilmoist1_day_livneh_vic -0.2322 0.0126 -18.4468 0
Linear Regression : log(FIRE_SIZE) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.4512 0.8955 7.2041 0.0000
tair_day_livneh_vic 0.0409 0.0217 1.8858 0.0604
soilmoist1_day_livneh_vic -0.2994 0.0402 -7.4540 0.0000
Linear Regression : log(totalarea) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.2773 0.9527 16.0351 0e+00
tair_day_livneh_vic 0.0936 0.0231 4.0523 1e-04
soilmoist1_day_livneh_vic -0.5316 0.0427 -12.4385 0e+00

The r-squared value for model of cases is 0.9 The r-squared value for model of average fire size 0.5377 The r-squared value for model of average total burned size with 0.7825 By the plot check we found that the cases and burned area model fit the linear model assumption well.

2.2 Large wildfire(c class or above) model

We build models to predict the large wildfire number firesize and total burned area.

## 
## Call:
## lm(formula = log(n) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic, 
##     data = joined4)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.82435 -0.40529 -0.01388  0.36890  1.68090 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                6.16436    0.47725   12.92  < 2e-16 ***
## tair_day_livneh_vic        0.09035    0.01117    8.09 2.67e-14 ***
## soilmoist1_day_livneh_vic -0.28469    0.02194  -12.98  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5844 on 248 degrees of freedom
## Multiple R-squared:  0.8616, Adjusted R-squared:  0.8605 
## F-statistic:   772 on 2 and 248 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = log(FIRE_SIZE) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic, 
##     data = joined4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1378 -0.7470 -0.1646  0.5175  5.0328 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                7.44918    0.93222   7.991 5.06e-14 ***
## tair_day_livneh_vic        0.04067    0.02182   1.864   0.0635 .  
## soilmoist1_day_livneh_vic -0.17695    0.04285  -4.130 4.96e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.141 on 248 degrees of freedom
## Multiple R-squared:  0.3366, Adjusted R-squared:  0.3313 
## F-statistic: 62.92 on 2 and 248 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = log(totalarea) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic, 
##     data = joined4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0838 -0.8558 -0.1446  0.8036  4.5455 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               13.61355    1.10190  12.355  < 2e-16 ***
## tair_day_livneh_vic        0.13102    0.02579   5.081 7.39e-07 ***
## soilmoist1_day_livneh_vic -0.46165    0.05064  -9.115  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.349 on 248 degrees of freedom
## Multiple R-squared:  0.7391, Adjusted R-squared:  0.737 
## F-statistic: 351.2 on 2 and 248 DF,  p-value: < 2.2e-16

We have the model that r-squared value for model of cases 0.8616 model of average fire size 0.3366 model of average total burned size with 0.7391 By the plot check we found that the model of large fire case does not fit well as the residual is not consistent. And the model of average large fire size result shows lack of some normality from qq-plot.

3. Regression of wildfires with different cause(natrual or people-caused)

3.1 Model for all fire incidents

## 
## Call:
## lm(formula = log(n) ~ Avg_Temp + Avg_SoilMoisture + Avg_Rainfall, 
##     data = summary_peoplecaused)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26039 -0.27735 -0.01591  0.28089  1.34352 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       7.024630   0.333811  21.044  < 2e-16 ***
## Avg_Temp          0.054946   0.007622   7.209 6.14e-12 ***
## Avg_SoilMoisture -0.206673   0.016432 -12.578  < 2e-16 ***
## Avg_Rainfall     -0.069238   0.032702  -2.117   0.0352 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4115 on 260 degrees of freedom
## Multiple R-squared:  0.8758, Adjusted R-squared:  0.8744 
## F-statistic: 611.2 on 3 and 260 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = log(n) ~ Avg_Temp + Avg_SoilMoisture + Avg_Rainfall, 
##     data = summary_nature)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.16534 -0.30546  0.00072  0.30435  1.34899 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       7.748455   0.394514  19.640  < 2e-16 ***
## Avg_Temp          0.063288   0.009008   7.025 1.87e-11 ***
## Avg_SoilMoisture -0.223913   0.019420 -11.530  < 2e-16 ***
## Avg_Rainfall     -0.109160   0.038649  -2.824   0.0051 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4863 on 260 degrees of freedom
## Multiple R-squared:  0.8652, Adjusted R-squared:  0.8637 
## F-statistic: 556.3 on 3 and 260 DF,  p-value: < 2.2e-16

3.2 Model for fire size

## 
## Call:
## lm(formula = log(FIRE_SIZE) ~ Avg_Temp + Avg_SoilMoisture + Avg_Rainfall, 
##     data = summary_nature)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6442 -1.0920 -0.1149  0.7720  4.6131 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       7.06866    1.15696   6.110 3.63e-09 ***
## Avg_Temp          0.03381    0.02642   1.280    0.202    
## Avg_SoilMoisture -0.34235    0.05695  -6.011 6.20e-09 ***
## Avg_Rainfall     -0.13530    0.11334  -1.194    0.234    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.426 on 260 degrees of freedom
## Multiple R-squared:  0.522,  Adjusted R-squared:  0.5165 
## F-statistic: 94.65 on 3 and 260 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = log(FIRE_SIZE) ~ Avg_Temp + Avg_SoilMoisture + Avg_Rainfall, 
##     data = summary_peoplecaused)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0738 -1.2231 -0.2220  0.8184  5.4271 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       4.75860    1.19499   3.982 8.87e-05 ***
## Avg_Temp          0.07681    0.02729   2.815  0.00525 ** 
## Avg_SoilMoisture -0.25909    0.05882  -4.405 1.55e-05 ***
## Avg_Rainfall     -0.03634    0.11707  -0.310  0.75649    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.473 on 260 degrees of freedom
## Multiple R-squared:  0.4676, Adjusted R-squared:  0.4615 
## F-statistic: 76.13 on 3 and 260 DF,  p-value: < 2.2e-16

3.3 Model for burning area

## 
## Call:
## lm(formula = log(FIRE_SIZE * n) ~ Avg_Temp + Avg_SoilMoisture + 
##     Avg_Rainfall, data = summary_nature)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8476 -0.9807 -0.0634  0.8797  5.1056 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      14.81712    1.24075  11.942  < 2e-16 ***
## Avg_Temp          0.09710    0.02833   3.427 0.000708 ***
## Avg_SoilMoisture -0.56626    0.06108  -9.271  < 2e-16 ***
## Avg_Rainfall     -0.24446    0.12155  -2.011 0.045338 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.529 on 260 degrees of freedom
## Multiple R-squared:  0.7597, Adjusted R-squared:  0.7569 
## F-statistic:   274 on 3 and 260 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = log(FIRE_SIZE * n) ~ Avg_Temp + Avg_SoilMoisture + 
##     Avg_Rainfall, data = summary_peoplecaused)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2742 -1.1769 -0.1117  1.0471  5.4572 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      11.78323    1.28919   9.140  < 2e-16 ***
## Avg_Temp          0.13176    0.02944   4.476 1.14e-05 ***
## Avg_SoilMoisture -0.46576    0.06346  -7.339 2.75e-12 ***
## Avg_Rainfall     -0.10558    0.12630  -0.836    0.404    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.589 on 260 degrees of freedom
## Multiple R-squared:  0.7077, Adjusted R-squared:  0.7043 
## F-statistic: 209.9 on 3 and 260 DF,  p-value: < 2.2e-16